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Abstract 



The mutual information, /, of the three-state neural network 
can be obtained exactly for the mean-field architecture, as a 
function of three macroscopic parameters: the overlap, the 
neural activity and the activity-overlap, i.e., the overlap re- 
stricted to the active neurons. We perform an expansion of / 
on the overlap and the activity-overlap, around their values 
for neurons almost independent on the patterns. From this 
expansion we obtain an expression for a Hamiltonian which 
optimizes the retrieval properties of this system. This Hamil- 
tonian has the form of a disordered Blume-Emery-Griffiths 
model. The dynamics corresponding to this Hamiltonian is 
found. As a special characteristic of such network, we see 
that information can survive even if no overlap is present. 
Hence the basin of attraction of the patterns and the retrieval 
capacity is much larger than for the Hopfield network. The 
extreme diluted version is analized, the curves of information 
are plotted and the phase diagrams are built. 

PACS numbers: 87.10, 64.60c Keywords:Statistical 
physics, Neural network, Multi-state neuron, Spin-1 model, 
Sparse Code, Information theory, Dynamical systems, 
Blumc-Emery-Griffith 



1 Introduction 

The collective properties of neural networks, such as the 
storage capacity and the overlap with the memorized pat- 
terns, havebeen a subject of intensive research in the last 
decade 111], |H. However, more precise measures of their per- 
formance as an associative memory, as the information ca- 
pacity and the basins of attraction of their retrieval states, 
have received comparatively less attentionJ^J-lcj] . For some 
models as the sparse-code networks M - [Bj , or the three-state 
networks ]lo| - Jl2|] , where the patterns are not uniformly dis- 
tributed, an information-theoretical approach ||U3]-|l5l seems 
crucial. 

Calculations of the Shannon m utu a l in formation {I) for 



the sparse-code network were made[|L6||-|lij|. For low i 
of patterns, a few time steps are need to retrieve them til , . 
However, for large storage, only imperfect retrieval is possi- 
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ble. The closer to saturation, the larger the time steps re- 
quired to dynamical retrieval. So, first-time retrieval is not 
enough and it is interesting to study the information capacity 
of recurrent networks. To improve / for this recurrent net- 
work, a sche me, based on a self-control threshold mechanism, 
was proposed^ . This Self-Control Neural Network (SCNN) 
is an adaptive scheme induced by the dynamics itself instead 
of imposing any external constraint on the activity of the 
neurons. Such procedure successfully increases both I and 
the basins of attraction of the patterns. Similar mechanisms 
can improve / for three-state low-activity networks |2o| , with 
diluted and fully-connected architectures. 

Here we propose a new method, based on direct use of 
the I calculated in the mean-field approximation, to obtain 
a Hamiltonian which maximizes / within a large range of 
values for the activity of the network. 

A three-state neural network is defined by the use of 
a set of fj, = X,...,p ternary patterns, £ {0, ±1}, i = 
1, . .., TV}, which are independent random variables given by 
the probability distribution 



(1) 



where a is the activity of the patterns (G* = are the inac- 
tive states). A low-activity three-state neural network cor- 
responds to the case where the distribution is not uniform, 
ie, a < 2/3. In the limit a = 1 the binary Hopfield model is 
reproduced. 

The information enclosed in a simple unit £? is given by 
the entropy of its probability, 

= -oln(o/2) - (1 -a)ln(l- a). (2) 

One can define as sparse a code whose fraction of active neu- 
rons is very small and tends to zero in the thermodynamic 
limit B. Sparse-code binary patterns can have a large load 
rate a ~ [a|Zra(a)|] — 1 , where a is the ratio between the num- 
ber of patterns p and number of connection per neuron N 
p|],[fl6{. However, the information per unit for the sparse 
code is H^i ~ a\ ln(a)| <g 1, and it is not clear if the total 
information per connection, i = y] . H / hj} = ctH^i 
of such network is larger than the uniform (non-sparse) one. 

Although ternary patterns with low-activity have not 
been studied Je the same proportion, they present a simi- 
lar behavior [hij . Besides of the fact that ternary patterns 
are a step towards an analog neural model, they have the 
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advantage that they can be generated with a bias but keep- 
ing their symmetric distribution (both ±1 states are consid- 
ered active). An important question related to the three- 
state model is the measurement of the retrieval quality in 
the cases where this is imperfect. The overlap alone is not 
anymore a good measure because it accounts only for the 
active states. For the homogeneous ternary patterns, the 
Hamming distance can be considered a good measure, since 
it takes equally into account all errors in retrieving, the ac- 
tive and the inactive one. For the low-activity case, however, 
also the Hamming distance is not a good parameter of the 
retrieval quality because the errors in retrieving the active 
states are much more relevant (they contain much more in- 
formation) than the errors in retrieving the inactive states. 
To solve this problem, we use the conditional probability of 
neuron states given the pattern states ]2o| , to obtain the mu- 
tual information / of the attractor neural network (ANN). 
This quantity measures directly the amount of dependence 
between the random variables, neurons and patterns. To ac- 
complish that, we must use a new variable, we call activity- 
overlap, which is the overlap between the active states of the 
ternary neurons and the active states of the ternary patterns, 
taken with the absolute value. 

This / is thus a function of three parameters: the over- 
lap m, the neural-activity q, and the activity-overlap n. We 
then expand the / around the values of the parameters when 
the neurons are independent on the patterns. This expansion 
gives us an expression that can be interpreted as a Hamilto- 
nian, a function only of the neuron states and the synaptic 
couplings. This Hamiltonian is similar to the Blume-Emery- 
Griffithsj2l)-|28) spin-1 model (BEG), but with random in- 
teractions. The BEG model, originally proposed to study 
He-j — He^ mixtures, was latter used to describe several sys- 
tems, like memory alloys, fluid mixtures, micro-emulsions, 
etc., and displays a variety of new thermodynamic phases. 

Some disordered BEG models have been recently stud- 
ied |2i|-||3l|, where either the exchange-interactions or the 
crystal-field are random variables. However, from our knowl- 
edge, no random biquadratic-interactions model has been 
treated up to this date. 

We describe our model in Section II. In Section III we 
describe the I measures used to evaluate the performance of 
the ANN, and derive the BEG Hamiltonian from /. After 
solving the thermodynamics for this model in Section IV, 
we present some results for the dynamics and the phase dia- 
grams in the section V, comparing the results with previous 
works. We conclude in the last Section with some comments 
about possible improvements of the network. 

2 The model 

As well as the pattern states, the neuron states at time t are 
three-state variables, defined as 



a it S{0,±1}, i = l,.,.,N. (3) 

They are updated according to a stochastic dynamics which 
depends on the previous states {<7; t_i} and on synaptic in- 
teractions between different neurons. The specific form of 
the synapses will be obtained latter, by construction. We 
will see they are of the Hebbian type, that is, the learning is 
local (the synapses depend only on the two neurons interact- 
ing). Moreover, the updating rule will be also obtained by 
construction, no supposition being done here except that the 
patterns have the same three-state symmetry as the neuron 
states. 

The three-state patterns §^ S {0, ±1}, fj, = 1, ...,p, 
are independent identically distributed random variables 
(IIDRV) chosen according to the probability distribution 



in Eq.(|lj). There is no bias ((£^) = 0) neither correlation be- 
tween patterns ((£^£^) = 0), and a = | 2 } is the activity 
of the patterns. 

The mean-field networks have the property of being site- 
independent, that means, the correlations between different 
sites are negligeable in the thermodynamic limit, N — » oo. 
This implies that every macroscopic quantity satisfies the 
conditions of the law of large numbers (LLN), so they can 
be defined as an average on the probability distribution of a 
state in a single site. If / is the thermodynamic limit of the 
variable /jv, we have 

Fn s 1 fi(°i> 6) -> F =< /( ff > ( 4 ) 

i 

where the brackets represent averages over the distribution 
of a single, typical state a, £ (we can drop the index i). 

An example of this is the special case of the overlap, 
but this property is valid also for every function Fpj, since it 
comes from a property of the probability distribution of the 
states itself, 

p(W},{(}) = Y[p(^,^)- (5) 

i 

The task of retrieval is successful if the distance between 
the state of the neuron {era} and the pattern defined 
as 

i 

= a - 2am^ t + q Nt , (6) 

becomes small after some time t. This is the so-called Ham- 
ming distance (an Euclidean quadratic measure for discrete 
sets). The overlap of the ^tth pattern with the neuron-state 
is defined as: 

m « s dv£^' (7) 

i 

while the neural activity is 

qm - A?£ l<Jlt|2 - (8) 

i 

The are called the retrieval overlaps, and they are nor- 
malized parameters within the interval [—1,1], which attain 
the extreme values m l i J . = ±1 whenever <r, = if' 1 , as we 
see from Eq. (|l|). 

Another parameter is need to define completely the 
macroscopic state of the ANN. This is 

N 
i 

We call this quantity the activity-overlap^^ , as long 
as n^ ft represents the overlap between the sites, where the 
neurons are active, \&u\ = 1, and the sites where the pat- 
terns are active, = 1. For the dynamics used in most 
work found in the literature [|LC| , |l2| , [ [32{ , where the synapses 
used are of the Hopfield form Jij = y ' this param- 

eter Tifr, does not seem to play any role in the evolution 
of the network, independent of the architecture considered 
(diluted, layered or fully-connected for instance). However 
n^ t is necessary to define the mutual information of the net- 
work, as well as tins is necessary in computing the network's 
performance |B3j| . |'>4| . 
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For this Hopfield three-state network a self-control 
(SC) mechanism was recently introduced with the follow- 
ing threshold dynamics [20|: 9t = c(<x)At, where c(a) = 
\J— 2 ln(a) is a function only of the pattern activity, while 
the variance of the cross-talk noise (due to the p — 1 non- 
retrieved patterns) has the simple form At = y/ogjvl for 
the diluted architecture tea]. Here we take the alternative 
approach of starting from the mutual information for the 
model, which we describe in the next Section. 



3 Mutual Information 

3.1 Mean-Field-Theory 

Compared to the binary neural network (NN), where the 
natural parameter is the overlap tn 11 , to describe the sta- 
tistical macro-dynamics of the three-state NN, there should 
be two additional parameters. Although the only variables 
appearing in the usual Hopfield dynamics are the overlap 
m^ Nt and the neural activity gjyt > t ne activity-overlap n^ t is 
an independent parameter which complete the macroscopic 
description. For a long-range system, as the one we are con- 
sidering, it is enough to observe the distribution of a single 
typical neuron in order to know the global distribution. 

The conditional probability of having a neuron in a state 
an = <t in a time t, given that in the same site the pattern 
being retrieved is £^ = £, is: 



PH£) = (H + mS,u)8{a 2 - 1) + (1 - s s )5{a), 



= s+ ■ 



q 2 q-na 

-£ . s = 



1 — a 1 — a 

One can verify that this probability satisfies the averages: 



(10) 



m = (<°"><Tie-> 

a 

q=((tr 2 )„ l t) i , 



(11) 



These are the thermodynamic limits (TV — > oo) of Eqs.([7|j8|,|S|), 
for a given time t and pattern a. Due to the symmetry 
of the patterns, we have also ((c) CT |^)^ = ((< j2 )cr\^0( = 
({f)a-|£? 2 !U = 0. The averages are over the pattern distribu- 
tion, Eq.(hl), and over the conditional distribution, Eq.(hcf): 



<(-> ff ie>« =^2p(0^2pHO- = <-> CT , e - (12) 



Together with the distribution of the patterns, the condi- 
tional probability leads also to the probability 



I[<r,Q =S[cr]-{S[<T\Q) ( ; 



(14) 



S[a] and S^oii;] are the entropy of the output and the con- 
ditional entropy of the output, respectively. The quantity 
(5[cr|^])j is also called the equivocation term of the J[cr;f]. 

For an ANN of homogeneous distributed patterns the / 
is not a necessary measure, because the Hamming distance 
is enough to quantify the quality of the retrieval. The lat- 
ter distinguishes well between a situation where most of the 
wrong neurons were turned off (d = 1) and another situ- 
ation where they were flipped (d = 4). However, for the 
low-activity ANN, I can be a very useful measure, since the 
Hamming distance is not so good in distinguishing the cases 
where neurons were turned off from the cases where they were 
turned on (d = 1). This distinction is critical in the sparse 
coded three-state NN, where a <C 1, because the inactive 
states have less information than the active ones. 

For instance, let be an ANN with pattern activity a and 
denote the active (inactive) sites as A (1) , such that £^ = ± 1 
(§1 = 0). Now suppose that all neurons where turned off, 
Oi = 0, then m = 0, n = and q = 0, so the Hamming 
distance is d = a and there is no information transmitted, 
1 = 0. If instead of turning off the A = aN active neurons 
<r_4, one had turned on A neurons among the inactive ax, 
one get m = 1, n = 1 and q = 2a. So the Hamming distance 
is still d = a, but now there is some transmitted information. 
It is intuitive that the first kind of errors have erased all the 
meaningful bits, while the second situation have not affected 
essentially the code, and obviously have much less important 
errors. 

The expressions for the entropies defined above are: 



S[a] = -q\nj-(l-q) ln(l - q), 
{S[<j\S]) ( = aSa + (l-a)S 1 - a , 

Sa = ~ 

2 2 
-(1-n) ln(l - n), 



n + m n + m n — m n — m 
m In 



Sl-a 



i ln--(l- S )ln(l- S ). 



(15) 



Applying to the second case cited above, the entropy of 
the output is S[a] = —2a In a — (1 — 2a) ln(l — 2a), while the 
equivocation is < .S[o"|£] >= — a[ln a — In 2 — ln(l — a)] — (1 — 
2a)[ln(l — 2a) — ln(l — a)], so that the mutual information 
is / = -aln(2a) - (1 - a) ln(l - a) = S[£] - 2aln(2) which 
is not so smaller than the entropy of the original patterns, 
S[£], Eq.(Q). It is easy to understand why we must use / for 
the sparse code case, instead of the Hamming distance. 



p(a) EE P(0PM0 = qS(o- 2 - 1) + (1 - q)S(a). (13) 

With the a hove expressions we can calculate the Mutual 
Information Iu3,mS, a theoretical information quantity 
used to measure the average amount of information that can 
be received by the user by observing the symbol (or the sig- 
nal) at the output of a channel. We can regard all the dy- 
namical process, or rather each time step of it, as a channel, 
and write the / as: 



3.2 Derivation of the Hamiltonian 

We search for a Hamiltonian which is symmetric in any per- 
mutations of the patterns J^ 1 , since they are not known dur- 
ing the retrieval process. This imposes that the retrieval of 
any pattern § M is week, i.e., a is almost independent of it. 
Then obviously the overlap m M ~ 0. An expansion of / with 
o = l = g around m M ~ yields the Hopfield Hamiltonian. If 
afterwards some particular overlap becomes eventually large, 
this should be a consequence of the network evolution. 

However, for general a, q, this is not the only quantity 
which vanishes in this limit. The variable <r 2 is also almost 
independent of (£' 1 ) 2 , so that ~ q. Hence, the parameter 



1 - a 



o(l — a) 



(16) 



also vanishes when the states of the neurons and the patterns 
are independent. 

We use this fact to look at the information close to the 
non-retrieval regime. An expansion of the expression for the 
/ around = 0, P = gives 



J" 



1 a 
2q 



1 0(1 



2 q(l - q) 



a V) 2 . 



(17) 



Since this expression gives the information for a single 
site i of a single pattern fi, 7(m M ,i' 1 ) = / M , it should be 
summed = N 1^ to give the total information of 

the network. It is natural to associate this quantity with the 
opposite of the Hamiltonian, because the maximum of the 
information gives the minimal energy. 

We suppose, as a further simplification of the model, 
that the neural activity is of the same order of the_nattern ac- 
tivity, q ~ a. With this assumption, I from Eq.(hjj) depends 
on the same way on m' 1 and Z M . Substituting the cxpressiorjs 
for these parameters, given by the definitions (M),(B) and (b|) 
(i.e., Eqs.(pj]) before the thermodynamic limit), we obtain 
the following expression for the I: 



where 



and 



H = —I P N = Hi +T1.2, 



Hi = - - JijViVj 



H2 = -\Y^, K H°i°. 



(18) 



(19) 



(20) 



are the quadratic and the biquadratic terms, respectively. 
The above expression for the Hamiltonian, obtained from the 
mutual information close to the non- retrieval regime, has the 
same form as of the BEG model |2l[. We call our model the 
BEG Neural Network (BEGNN). 

The interactions are randomly distributed, given by 



M=l 



and 



Ki 



1 " 



(21) 



(22) 



The first term of the Hamiltonian is the .usual Hopficld 
model with the Hebbian rule given by Eq. (Ell). The sec- 
ond term, arising from the term depending on i M in Eq,([rij), 
related to the activity-overlap, is also Hebbian-like, but is 
associated, as will be seen latter, with the quadrupolar order 
of the system. 

Note that the Hamiltonian formulation of the problem 
is only possible in the case of fully-connected neural net- 
work, where the interaction matrix is symmetric. In the 
next Section we will present the dynamical formulation of 
the problem, which can be applied to the cases of asymmet- 
ric couplings [ 35| . 

As is we 1 known, the phase diagram of the usual BEG 
model is very rich, showing different phases, depending on 



the sign and the strength of the biquadratic coupling con- 
stant. Without any disorder and for very negative bi- 
quadratic coupling constant, a quadrupolar phase, related to 
the quadrupolar moment < a 2 > also appear, apart of the 
usual disordered and ferromagnetic phases |B2|-|Bg|. How- 
ever, our variables £f* are quenched, so we nave a disor- 
dered system. BEG models withdisordered quadratic cou- 
pling have been recently studied |B9J-|B1| , showing some new 
phases (spin-glass, quadrupolar spin-glass phases, etc), but, 
from our knowledge, no disordered biquadratic BEG model 
has been studied up to this date. 



Asymptotic 
Macro-dynamics 



For the derivation of the asymptotic macro-dynamics we will 
use a naive mean-field (MF) approach using the Hamilto- 
nian Eqs.(|lE|)-(po|). Since the Hamiltonian is quadratic in 
the overlaps, we can linearize it, using Gaussian transforma- 
tion, to obtain the partition function: 



Z = Tr {a }e- (3U = 

/ J^P*(v / ^ mM )- D *(v / ^ r/ ' 1 )] II ^2 

J A l_1 



(23) 

Jii 



i ct=±1,0 



where D$(z) = dze S~ /V2tt, and /3 = 1/T. The effective 
Hamiltonian is 



Hi = hiiJi + did, , 



(24) 



where the local fields are 



(25) 



After taking the trace over the spitL variables, we apply 
a saddle point integration and use Eq. M) for the thermody- 
namic limit, to get the free energy in terms of the parameters 
m, I and q: 



f = - — In Z = -<rn 2 + P) - T < In Z >g>. 
N 2 V ; «' 



where the effective partition function is: 
Z= l + 2e' 5e cosh(/3/ l ). 



(26) 



(27) 



The fields h, 9 are defined in Eq. (|25|), but the indices i can 
be dropped out. The saddle-point equations 9//9m M = 
and df/dlf* = 0. leads to the following expressions for the 
stationary states: 



(28) 



where the angular brackets mean the average over the pat- 
terns, and the thermal averages of the states are: 



A 



a = Fp{h,6) 



Gp{h,6) 



2e /3e sinh(/3h) 
Z ' 
2e" e cosh(/3/i) 



(29) 



For zero-temperature, the behavior of the averages are: 



to the Central Limit ' I hen 
Gaussian distributcdfll2 



=m (CLT), they are independent 
with zero mean and variance 



Var[u t =o] = —aq t = = A 
a A 



Var[n t=0 ] 



(1 - a y 



(35) 



F x (h,e) 

Goo(h,e) 



sign(h)S(\h\ - 

e(\h\+e), 



(30) 



where 0(...) is the step function. 

This result, obtained from the naive MF theory, 
easily understood if we write the Hamiltonian in Eqs.( |l9||20| ) 
in the form: 



H 



i i 



E 



(31) 



So, the deterministic parallel dynamics, which leads to the 
minimization of the Hamiltonian, is 



(32) 



where the local fields (associated to the variables 

a j-t' a jt respectively), are given in the time step t. Such 
dynamics has the same form as the zero-temperature func- 
tion in Eq.Q). 

Alternatively to the thermodynamic approach, in the 
noise case, we can also start from the stochastic parallel 
dynamics ]l2| , [j^J : 



Although the dynamics for the parameters mt, nt and 
qt in the first time step is a function of the initial step, the 
expression for the noises in the next steps evolves with time 
in more complicated way then Eqs.(p5|). In the extremely 
diluted synaptic case]35|, however, the first time step de- 
scribes the dynamics for every time step t. From now on we 
will adopt this limiting case. 

Thus, in the asymptotic limit N — > oo, the expression 
for the overlap mt = limjv^oo m ]vt becomes, after averaging 
over the pattern £: 



mt+i =< —<rt >?= 
a S 



(36) 



I- a 



where the averages over u.Slon the brackets should be done 
with the Gaussian distributions, Eq(|35[). 

The neural activity is the thermodynamic limit of 
Eq.(H), which reads 



qt =< of >|= ant + (1 - a)s t , 



(37) 



st+i = j D<3>(y) J D$(z)G (yAt-,-jl±_ + z^-). 

Here s is the variable defined in Eq. (|l^) and the activity- 
overlap is given by 



pKt+ilW) = exp[/3W|]/Z, 



(33) 



5 2 — 

n t +i =< — ai >r= 
n * 



(38) 



where 



is given 



by Eq.^) (in the time step t), and Z I D ^ I D^{z)G^{— +yAf, - + z-^- 

;ntly from the dynamics for the (Q=3)- J J a a 1 — a 



by Eq.(|27j). Differently from the dynamics for the (Q=3) 
Ising model Q|33|, here the field 9 = 0({cr|}) in the effec- 
tive Hamiltonian is a function of the states in the previous 
time steps. The resulting noise-averaged states coincide with 
Eqs.(|29[) in the stationary regime. 

Because we are mainly interested on the retrieval prop- 
erties of our network, we take an initial configuration whose 
retrieval overlaps are only macroscopic of order O(l) for a 
given pattern, let say the first oneJVe singled out the term 
fj, = 1 in the local fields of Eq. (E3) in order to study the 
retrieval of the first pattern. 

Supposing an initial configuration {(r i t — q} as a collec- 
tion of IIDRV with zero-mean and variance qt—o, the fields 
ht—o and 9t—o in the zeroth time step are given by: 



ht=o = -£ m t=0 +o; t=0 ; 



9t=o = vh=o + ^t=o; 



V- 1 

a 

V 

U t=o = ^2v"lt=o> (34) 



where the indices /i = 1 where dropped, and the rest of 
the patterns is regarded as some additive noise. According 



jie equation for It is obtained using the definition in 
Eq.(hq), l t = (n t — ?t)/(l — a)- It is worth to note that the 
definitions of t he para meters m, q, n in Eqs.(hll) are the same 
as that in Eqs.(BfJ-pS), since the average overthe conditional 
probability p(o"|£) is equivalent to the average over the noise 
due the p— 1 remaining patterns Eqs.( ^6[^g ) describe 

the macro-dynamics of the diluted BEGNN by adapting self- 
consistently the threshold during the time-evolution of the 
system. With these equ a tio n s we can calculate the mutual 
information from Eqs.(n4-ll5|). 



5 Phase diagram 



In this Section we present some explicit results for the 
BEGNN m odel . We first calculated the stable fixed-points 
of the Eqs.(Bfipq) for the asymptotic N — » oo network, and 
obtained the curves for the order parameters m, q, n and the 
information i = la as a function of the load parameter 
a for two values of the activity a (Fig.l). For small load 
(a < 0.2), the overlap remains close to m ~ 1 and the neu- 
ral activity is q ~ a. When more patterns are stored in the 
network, i increases almost linearly, up to an optimal value, 



iopt{o op t), after which i decreases to zero in a max . The com- 
parison's Hone with the self-control neural network (SCNN) 
model pSJgPOJ. It is seen that for small activities (a = 0.3), 
the BEGNN model gives worst results compared with the 
SCNN model, with a smaller value for i, while for a = 0.6 
(close to the uniform distribution of patterns, a = 2/3), the 
BEGNN performs better, with an optimal value of the infor- 
mation i ~ 0.15, although it is attained for a smaller value of 
load, a ~ 0.2. The reason for this behavior is that the third 
order parameter (related to the activity-overlap), is I ~ 1 for 
the BEGNN (SCNN) and I < 1 for the SCNN (BEGNN) 
with a = 0.6 (a = 0.3). 

The behavior of the order parameters and the i with 
load, for the zero-temperature case is presented for three dif- 
ferent values of the activities (Fig. 2). The initial conditions 
used where mo = 10 — 6 ,lo = 1,90 = &> such that there is 
almost no initial overlap. In this case there is always a sharp 
fall on the information for a not so larger than a op t. We see 
different behaviors depending on the activities. 

The corresponding dynamical phase diagram is drawn 
in Fig. 3. Four possible phases are present: the retrieval R 
(m j4 0, 1 j£ 0, q ~ a) and M (m ^ 0, 1 < 0.5m, q ~ a) 
phases, the quadrupolar phase Q (m = 0, 1 ^ 0, q ~ a) and 
the zero phase Z (m = 0,/ = 0, q ~ a). The last phase 
Z, so called because there is no information transmitted, 
is analogue to the S e lf-s ustained (S) activity phase of the 
(Q=3)-Ising ANN[h2J [B3j , since the parameter related to the 
spin-glass order is q ^ 0. We have not find any paramagnetic 
(P) phase, with all (m = 0, 1 = 0, q = 0) for the BEGNN. 
Note that the quadrupolar phase is a quite new phase, com- 
pared to the other NN models and is a special one for the 
BEGNN-model. This phase is also present in the original 
BEG-moiicl[ElL as well as in all its generalizations including 
disorder ]29| - 131| . It is seen in Fig. 2, for a = 0.9, where the 
overlap goes to m = at a ~ 0.13 (much before I, which goes 
to zero at a ~ 0.3); this phase corresponds to non-zero in- 
formation, although there is no retrieval overlap. The phase 
R appears for a = 0.5, where both m and I are large and so 
is i. On the other hand, the phase M is observed for a = 0.1 
where the parameter I is much smaller than m. The phase 
transitions from R or M to Z are usually sharp. 

The behavior of the order parameters and the infor- 
mation with the temperature T for fixed activity a = 0.5 
is shown on Fig. 4. We observe an increase of i with the 
temperature, showing an optimal value for T ~ 0.2. Such 
an improvement of a feeble signal with noise, similar to the 
stochastic resonance phenomena, appears also in other phys- 
ical systems |$q|. A further increase in temperature leads to 
decreasing the information of the model. We note that this 
behavior doesn't hold for a > 2/3, nevertheless there is still 
an increase of the storage capacity a max . The last result is in 
agreement with other investigations of dynamical activity of 
real and model neurons, where the observed stochastic reso- 
nance disappears by increasing the amplitude of the external 
stimulus. 

A cut of the phase diagram in the plane Txa for a fixed 
value of the activity a = 0.7 is shown in Fig. 5. The dashed 
line, which corresponds to the optimal case, iopt(oi), is within 
either the phase R or Q. It is also interesting to observe 
that there are two separate Q-phase islands, for either small 
temperature T and large load a or large temperature T and 
small load a. The phase transitions become smoother with 
the temperature. 

Finally on Fig. 6 we present the evolution of the informa- 
tion and of the order parameters with the time i, for a given 
temperature T = 0.2 and activity a = 0.7, for two values of 
the load parameter a. As can be seen from this figure, for 
a = 0.4, which is close to the transition R-M, the change to 
the behavior of the order parameters needs more time steps 
than for a = 0.2. This is not strange due to the critical 
slowing down near the transition. However, an interesting 



new fact appears here: the parameters It and qt have a fast 
felt down to a much smaller value, after which the network 
stays a long while with an almost zero overlap, and finally 
the BEGNN is able to retrieve quite well the pattern. For 
instance, for a = 0.2, I falls to I ~ 0.6 and m stays near 
m ~ during the first t ~ 20 time steps, then they jump up 
to I ~ 0.8, m ~ 0.9, which means the memory pattern was 
(partially) attained. This result, caused by the instability 
of the Z-phase in this region, makes the BEGNN capacity 
much larger than that of the usual Hopfield model, in all its 
versions so far as we know. 

The behavior of the continuous phase transitions can be 
analytically studied within the mean-field approximation by 
expanding Eqs.(^-i^) for small values of the order parame- 
ters. A standard calculation, for example, for the transition 
line QZ (m = 0, 1 « 1) leads to the following expression: 



X QZ _ /3T QZ [ 
13(1 cosh/3w 



1 - 2e' 3n cosh (3ui \ (1 - 2a)/3 2 



(1 + 2e^ n cosh/3w) 3 



(39) 



where the transition temperature between the phases Q and 
Z is: 



with 



nQZ 



2e< 3n cosh (3u> 
(l + 2e' 3n cosh/3w) 2 



(40) 



2e' 3n cosh f3w 
1 + 2e@ n cosh /3uj 



+ 0(l 2 ). (41) 



Expanding the above expressions for small value of the 
load rate and large temperatures, f3y/a <g 1, and calculat- 
ing the averages over the noise up to the leading terms, one 
obtains the following equation for the transition line: 



i (i + a 2 ) 



9aa 



(42) 



The last expression for T c is in qualitative agreement with 
the previous results shown on Fig. 5. 

Regarding the equation for the order parameter I, one 
can verify that in leading order: 



f>z _ p T QZ l 



1 (1 



27 



^-fl 2 + 0(l 3 ,al 2 ). (43) 



By use of Eq.(|42[), it is seen that the quadratic term of the 
above expansion changes sign when the activity a = 0.5, thus 
defining a tricritical line between the transition of second 
order (a > 0.5) and of first order (a < 0.5). Note that 
similar tricritical behavior has been described also in the 
other versions of the BEG model H-El). Similar analysis 
can be also performed for the other continuous transition 
between the different phases. 



6 Conclusions 

In this paper we proposed a BEG-like Hamiltonian for a 
ternary neural network, which couplings arise from an expan- 
sion of its mean-field mutual information, I ]2C[|, resulting in a 
system evolving with a self-consistently adapting threshold. 
The stationary and dynamical equations for this model were 
obtained as functions of three order parameters, the overlap 
m, the neural activity q, and the activity-overlap n. Their 
solutions were explicitly calculated as functions of the vari- 
ables: the pattern activity a, the load a and the temperature 
T. 



When the activity is near a = 2/3, corresponding to 
the uniform ternary patterns, the BEGNN improve s th e 
formation, compared with a previous SCNN modcl|19|,| 
Improvement of the information content by increasing the 
noise, effect similar to the stochastic resonance, is also ob- 
served for activities a < 2/3. 

There are four possible phases for the BEGNN, which 
were displayed in phase diagrams a X a and T X a. In par- 
ticular, a quadrupolar phase, Q, with m = 0, Z ~ 1, holds 
whenever the activity is large enough. This phase, known in 
the BEG literature, but new in an ANN context, carries out 
some nonzero information about the patterns even without 
any overlap m. 

As the main result we obtained that, while the phase Z 
is not stable in a large range of the variables, the basin of 
attraction of the retrieval phase is increased with respect to 
the usual ternary neural network models. States with initial 
conditions having very small overlap flow to final states with 
large overlap. 

We believe that the BEGNN has a quite large range 
of applications for real systems. We also think that this 
way to obtain an Hamiltonian starting from a mean-field 
calculation of /, which yields an almost optimal retrieval 
dynamics, can be generalized to other spin systems, as the 
Q-Ising with Q > 3 or the Potts models, for instances. Such 
method, based on the maximization of the entropy, can be 
an universal approach to information systems. 

Then, we expect that the same improvement should 
happen for analogue neurons and for networks of binary 
synapses. It would be also interesting to investigate the case 
of local field for multi-neuron synapses, which comes up from 
higher order terms in the expansion of the mutual informa- 
tion, such that a better use of a network with fixed size is 
expected. 
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Figure 2: The information i and the order 
parameters m (solid line), I (thin line) and q 
(dashed line) against a with activities a = 0.1 
(left), a = 0.5 (center) and a = 0.9 (right); The 
temperature T = and the initial conditions 
are niQ = 10 _6 ; Iq = 1, qo = a. 
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Figure 1: The information i = la and the or- 
der parameters m,l,q against a with activities 
a = 0.3 (left) and a = 0.6 (right). The tem- 
perature T = and the initial conditions are 
niQ = 1, Iq = 1, qo = a. The continuous line is 
for the BEGNN while the dashed line is for the 
SCNN. 




Figure 3: The dynamical phase diagram ax a, 
for T = with initial conditions niQ = 10 -6 , 
Iq = 1, qo = a. The different phases are ex- 
plained in the text. 
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Figure 4: The information i and the order 
parameters m (solid line), I (thin line) and q 
(dashed line) against a with temperatures T = 
0.0 (left), T = 0.2 (center) andT = 0.4 (right); 
The activity a = 0.5 and the initial conditions 
are tjiq = 10 -6 , Iq = 1, qo = a. 



Figure 5: The dynamical phase diagram axT, 
for a = 0.7 with initial conditions mo = 10 -6 , 
lo = 1, qo = a. The dashed line corresponds to 
the optimal information. 



Q 




Figure 6: The information i and the order 
parameters m,l,q, against the time t for tem- 
perature T = 0.2 and activity a = 0.7, with 
a = 0.4 (left) and a = 0.2 (right). The initial 
conditions are mo = 10~ 6 , Iq = 1, qo = a. 
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